A HowNet-based Semantic Relatedness Kernel for Text Classification
نویسنده
چکیده
The exploitation of the semantic relatedness kernel has always been an appealing subject in the context of text retrieval and information management. Typically, in text classification the documents are represented in the vector space using the bag-of-words (BOW) approach. The BOW approach does not take into account the semantic relatedness information. To further improve the text classification performance, this paper presents a new semantic-based kernel of support vector machine algorithm for text classification. This method firstly using CHI method to select document feature vectors, secondly calculates the feature vector weights using TF-IDF method, and utilizes the semantic relatedness kernel which involves the semantic similarity computation and semantic relevance computation to classify the document using support vector machines. Experimental results show that compared with the traditional support vector machine algorithm, the algorithm in the text classification achieves improved classification F1-measure.
منابع مشابه
A Knowledge-Based Semantic Kernel for Text Classification
Typically, in textual document classification the documents are represented in the vector space using the “Bag of Words” (BOW ) approach. Despite its ease of use, BOW representation cannot handle word synonymy and polysemy problems and does not consider semantic relatedness between words. In this paper, we overcome the shortages of the BOW approach by embedding a known WordNet-based semantic re...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملSemantic smoothing for text clustering
In this paper we present a new semantic smoothing vector space kernel (S-VSM) for text documents clustering. In the suggested approach semantic relatedness between words is used to smooth the similarity and the representation of text documents. The basic hypothesis examined is that considering semantic relatedness between two text documents may improve the performance of the text document clust...
متن کاملA Method for Chinese Short Text Classification Considering Effective Feature Expansion
This paper presents a Chinese short text classification method which considering extended semantic constraints and statistical constraints. This method uses “HowNet” tools to build the attribute set of concept. when coming to the part of feature expansion, we judge the collocation between the attribute words of original text and the characteristics before and after expansion as the semantic con...
متن کاملText Relatedness Based on a Word Thesaurus
The computation of relatedness between two fragments of text in an automated manner requires taking into account a wide range of factors pertaining to the meaning the two fragments convey, and the pairwise relations between their words. Without doubt, a measure of relatedness between text segments must take into account both the lexical and the semantic relatedness between words. Such a measure...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013